-
Notifications
You must be signed in to change notification settings - Fork 0
[SYCL] Add RT dependency on interface layer for offloading #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For reviewers .../ssfork_llvm$ ninja -C $build_llvm install | grep "level_zero" |
|
@tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel Would be nice to start reviewing this step earlier. |
|
kindly ping: @tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel |
| set(UMF_LINK_HWLOC_STATICALLY ON CACHE INTERNAL "static HWLOC") | ||
| endif() | ||
|
|
||
| fetch_adapter_source(level_zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You note in the PR description that "UR is considered as temporal solution until llvm-project/offload is
fully functional and is able to replace UR". I afraid that if it will be merged it will very well become an actual solution which will be quite hard to remove. For example, existing UR depends on 4 adapters - are you sure that code for all adapters will be easily/at all accepted into llvm-project/offload project? I do not believe that such a temporary solution is the right approach. Instead, it's better to focus on llvm-project/offload directly, limit the scope for initial support (Intel GPUs) and go from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sergey-semenov could you please help to answer this since this approach had been discussed before I joined upstreaming activity.
AFAIK UR presence in upstream was discussed and not really greeted in community. Although folks made an agreement to start with UR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @dvrogozh
I see the email discussions and RFC discussions about this issue. But I am not able to find any communication on what we agreed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've had any pushback on UR as a short-term dependency to unblock rt upstreaming in the RFCs (beyond being asked to run this by the LLVM board, which we have). I believe the current plan is to bring liboffload to functional parity with UR this year, which is when we're going to switch to it in both intel/llvm and upstream. @RaviNarayanaswamy @alycm please correct me if I'm wrong on any of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sergey-semenov is correct. liboffload is being worked on, currently most of the contribution is done by CodePlay
. For the short term there was no objection from the community to use UR for offloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codeplay, unless CodeSourcery are helping too! :-)
But yes, we're working on liboffload. Using liboffload is the long-term goal, but it is not yet mature enough to fully support SYCL-RT.
There is a liboffload adapter in Unified Runtime, so you can run SYCL-RT --> Unified Runtime --> liboffload. We're using this to drive development and for testing. But most SYCL features don't work yet.
Hi @KseniyaTikhomirova Thanks for ping. I will look at this today. |
| @@ -0,0 +1,26 @@ | |||
| ===================== | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Any reason why this is this not directly under docs?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UR is implementation details (design) of SYCL RT. I believe that under docs we should keep user visible things like guides, FAQ, release notes and other.
libcxx also splits documents in this way https://github.com/llvm/llvm-project/tree/main/libcxx/docs
intel/llvm splitting is also very similar https://github.com/intel/llvm/blob/sycl/sycl/doc/design/UnifiedRuntime.md
|
@tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel, @sergey-semenov |
|
|
||
| .. _unified runtime: | ||
|
|
||
| Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid a sub-section (Overview) here. We can add this if we add more details to this document.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in 7fb24ad
| Overview | ||
| ======== | ||
|
|
||
| The Unified Runtime project serves as an interface layer between the SYCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The Unified Runtime project serves as an interface layer between the SYCL | |
| The Unified Runtime (UR) project serves as an interface layer between the SYCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 7fb24ad
asudarsa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document changes look good. Couple of nits.
Thanks
LLVM prevents the sm_32_intrinsics.hpp header from being included with a #define SM_32_INTRINSICS_HPP. It also provides drop-in replacements of the functions defined in the CUDA header. One issue is that some intrinsics were added after the replacement was written, and thus have no replacement, breaking code that calls them (Raft is one example). This commit backport the code from sm_32_intrinsics.hpp for the missing intrinsics. This is the second try after PR llvm#143664 broke tests.
The function already exposes a work list to avoid deep recursion, this commit starts utilizing it in a helper that could also lead to a deep recursion. We have observed this crash on `clang/test/C/C99/n590.c` with our internal builds that enable aggressive optimizations and hit the limit earlier than default release builds of Clang. See the added test for an example with a deeper recursion that used to crash in upstream Clang before this change with the following stack trace: ``` #0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:804:13 #1 llvm::sys::RunSignalHandlers() /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Signals.cpp:106:18 #2 SignalHandler(int, siginfo_t*, void*) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3 #3 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0) #4 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12772:0 llvm#5 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#6 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#7 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#8 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#9 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#10 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#11 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#12 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#13 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#14 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#15 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#16 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#17 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#18 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#19 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 ... 700+ more stack frames. ```
This change adds support for the not equal operation for ComplexType llvm#141365
Summary: The allocator interface is supposed to have 16 byte alignment (to keep it consistent with the CPU allocator. We could probably drop this to 8 if desires.) But this was not enforced because the number of bytes used for the bitfield sometimes resulted in alignment of 8 instead of 16. Explicitly align the number of bytes to be a multiple of 16 even if unused.
PR llvm#141106 changed the debuginfo metdata to allow dynamic bit offsets and sizes. This caused a crash in lld when using LTO. The problem is that lazyLoadOneMetadata assumes that the metadata in question can be cast to MDNode; but in the typical case where the offset is a constant, this is not true. This patch changes this spot to allow non-MDNodes through. The included test case comes from the report in llvm#141106.
…BB_ADDR_MAP_V0). (llvm#146186) Version 2 was added more than two years ago (llvm@6015a04). So it should be safe to deprecate older versions.
This patch fixes: lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:415:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions] lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:536:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions] lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:672:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions]
This patch introduces a new custom type `!spirv.arm.tensor<>` to the MLIR SPIR-V dialect to represent `OpTypeTensorARM` as defined in the `SPV_ARM_tensors` extension. The type models a shaped tensor with element type and optional shape, and implements the `ShapedType` interface to enable reuse of MLIR's existing shape-aware infrastructure. The type supports serialization to and from SPIR-V binary as `OpTypeTensorARM`, and emits the required capability (`TensorsARM`) and extension (`SPV_ARM_tensors`) declarations automatically. This addition lays the foundation for supporting structured tensor values natively in SPIR-V and will enable future support for operations defined in the `SPV_ARM_tensors` extension, such as `OpTensorReadARM`, `OpTensorWriteARM`, and `OpTensorQuerySizeARM`. Reference: KhronosGroup/SPIRV-Registry#342 --------- Signed-off-by: Davide Grohmann <[email protected]> Signed-off-by: Mohammadreza Ameri Mahabadian <[email protected]>
…/isGuaranteedNotToBeUndefOrPoisonForTargetNode (llvm#146728) None of these implicitly generate UNDEF/POISON
The only use of Receiver is to initialize RecExpr. This patch renames Receiver to RecExpr while removing the cast statement.
) This patch fixes the following error: ``` llvm/lib/Support/TextEncoding.cpp:274:11: error: cannot initialize a variable of type 'char *' with an rvalue of type 'const char *' 274 | char *Input = InputLength ? const_cast<char *>(Source.data()) : ""; | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```
In DXC, there is an option to enable all KHR extension. I would like to extend the existing `-spirv-ext` backend commandline option to have the same capability. It is like the special case for `all` execept it only adds the `SPV_KHR_*` extensions. Part of llvm#137650.
This reverts commit 988876c. Was intended to be a PR
unbreak gcc CI bots.
Refactors new/delete interceptor macros per the discussion in llvm#145087. Signed-off-by: Justin King <[email protected]>
…on (llvm#138144) Background: https://discourse.llvm.org/t/rfc-explaining-release-package-types-and-purposes/85985 So that users can understand which they should use, particularly for Windows. The original text about community builds is kept, after explaining the main release package formats. In addition, explain how to use gpg or gh to verify the packages.
…lvm#146909) The only difference is that with libc++ the summary string contains the derefernced pointer value. With libstdc++ we currently display the pointer itself, which seems redundant. E.g., ``` (std::unique_ptr<int>) iup = 0x55555556d2b0 { pointer = 0x000055555556d2b0 } (std::unique_ptr<std::basic_string<char> >) sup = 0x55555556d2d0 { pointer = "foobar" } ``` This patch moves the logic into a common helper that's shared between the libc++ and libstdc++ formatters. After this patch we can combine the libc++ and libstdc++ API tests (see llvm#146740).
…ch64 macOS version
Currently failing on the arm64 macOS CI with:
```
06:59:37 Traceback (most recent call last):
06:59:37 File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/test/API/commands/frame/var-dil/basics/GlobalVariableLookup/TestFrameVarDILGlobalVariableLookup.py", line 47, in test_frame_var
06:59:37 self.expect_var_path("ExtStruct::static_inline", value="16")
06:59:37 File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2589, in expect_var_path
06:59:37 value_check.check_value(self, eval_result, str(eval_result))
06:59:37 File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 301, in check_value
06:59:37 test_base.assertSuccess(val.GetError())
06:59:37 File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2597, in assertSuccess
06:59:37 self.fail(self._formatMessage(msg, "'{}' is not success".format(error)))
06:59:37 AssertionError: '<user expression 0>:1:1: use of undeclared identifier 'ExtStruct::static_inline'
06:59:37 1 | ExtStruct::static_inline
06:59:37 | ^' is not success
06:59:37 Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/lldb-build/bin/clang
06:59:37 ----------------------------------------------------------------------
06:59:37 Ran 1 test in 2.322s
06:59:37
```
Can't repro this locally so skipping on older macOS versions that the CI
is running.
The inheritance hierarchy for `llvm::ms_demangle::Node` ([doxygen](https://llvm.org/doxygen/structllvm_1_1ms__demangle_1_1Node.html)) is a bit more involved. One thing that's missing without RTTI is the ability to determine if a node is a symbol, identifier, or type (or one would need to check for every kind). This PR adds support for `dyn_cast`, `isa`, and friends to `llvm::ms_demangle::Node`. As the type already has a `kind()`, this mainly adds `classof` to the nodes as well as some start and end markers in the `NodeKind` enum.
…lvm#141937) RFC on discourse: https://discourse.llvm.org/t/rfc-debug-info-for-coroutine-suspension-locations-take-2/86606 With this commit, we add `DILabel` debug infos to the resume points of a coroutine. Those labels can be used by debugging scripts to figure out the exact line and column at which a coroutine was suspended by looking up current `__coro_index` value inside the coroutines frame, and then searching for the corresponding label inside the coroutine's resume function. The DWARF information generated for such a label looks like: ``` 0x00000f71: DW_TAG_label DW_AT_name ("__coro_resume_1") DW_AT_decl_file ("generator-example.cpp") DW_AT_decl_line (5) DW_AT_decl_column (3) DW_AT_artificial (true) DW_AT_LLVM_coro_suspend_idx (0x01) DW_AT_low_pc (0x00000000000019be) ``` The labels can be mapped to their corresponding `__coro_idx` values either via their naming convention `__coro_resume_<N>` or using the new `DW_AT_LLVM_coro_suspend_idx` attribute. In gdb, those line numebrs can be looked up using `info line -function my_coroutine -label __coro_resume_1`. LLDB unfortunately does not understand DW_TAG_label debug information, yet. Given this is an artificial compiler-generated label, I did apply the DW_AT_artificial tag to it. The DWARFv5 standard only allows that tag on type and variable definitions, but this is a natural extension and was also blessed in the RFC on discourse. Also, this commit adds `DW_AT_decl_column` to labels, not only for coroutines but also for normal C and C++ labels. While not strictly necessary, I am doing so now because it would be harder to do so later without breaking the binary LLVM-IR format Drive-by fixes: While reading the existing test cases to understand how to write my own test case, I did a couple of small typo fixes and comment improvements
This patch is part of a series that adds origin-tracking to the debugify source location coverage checks, allowing us to report symbolized stack traces of the point where missing source locations appear. This patch completes the feature, having debugify handle origin stack traces by symbolizing them when an associated bug is found and printing them into the JSON report file as part of the bug entry. This patch also updates the script that parses the JSON report and creates a human-readable HTML report, adding an "Origin" entry to the table that contains an expandable textbox containing the symbolized stack trace.
The Buildkite CI was unintentionally disabled for a few weeks. This patch fixes the CI jobs now that is has been re-enabled.
The use-case for `__is_same_uncvref` seems rather dubious, since not a single use-cases needed the `remove_cvref_t` to be applied to both of the arguments. Removing the alias makes it clearer what actually happens, since we're not using an internal name anymore and it's clear what the `remove_cvref_t` should apply to.
These changes were split off from llvm#146503. This commit makes the output directories of libclc artefacts explicit. It creates a variable for the final output directory - LIBCLC_OUTPUT_LIBRARY_DIR - which has not changed. This allows future changes to alter the output directory more simply, such as by pointing it to somewhere inside clang's resource directory. This commit also changes the output directory of each target's intermediate builtins.*.bc files. They are now placed into each respective libclc target's object directory, rather than the top-level libclc binary directory. This should help keep the binary directory a bit tidier.
This extension extends the subgroup block read and write functions defined by `cl_intel_subgroups` (and, when supported, `cl_intel_subgroups_char`, `cl_intel_subgroups_short`, and `cl_intel_subgroups_long`) to support reading from and writing to pointers to the `__local` memory address space in addition to pointers to the `__global` memory address space. It is already supported by the Intel OpenCL compiler. Co-authored-by: Victor Mustya <[email protected]>
The prepare target was depending on the output of a custom command, but wasn't the full path to that file. This tripped up CMake if the file was removed as it didn't know how to rebuild that file.
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
8b552fa to
687ea05
Compare
|
hi @tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel, @sergey-semenov, I created another PR for these changes since initial (this) PR was created against old branch created for internal review and has too much differences with branch published to community. |
Tracked at llvm#112294 This patch implements from [basic.link]p14 to [basic.link]p18 partially. The explicitly missing parts are: - Anything related to specializations. - Decide if a pointer is associated with a TU-local value at compile time. - [basic.link]p15.1.2 to decide if a type is TU-local. - Diagnose if TU-local functions from other TU are collected to the overload set. See [basic.link]p19, the call to 'h(N::A{});' in translation unit #2 There should be other implicitly missing parts as the wording uses "names" briefly several times. But to implement this precisely, we have to visit the whole AST, including Decls, Expression and Types, which may be harder to implement and be more time-consuming for compilation time. So I choose to implement the common parts. It won't be too bad to miss some cases since we DIDN'T do any such checks in the past 3 years. Any new check is an improvement. Given modules have been basically available since clang15 without such checks, it will be user unfriendly if we give a hard error now. And there are a lot of cases which violating the rule actually just fine. So I decide to emit it as warnings instead of hard errors.
Extend support in LLDB for WebAssembly. This PR adds a new Process plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly targets. It adds support for WebAssembly's memory model with separate address spaces, and the ability to fetch the call stack from the WebAssembly runtime. I have tested this change with the WebAssembly Micro Runtime (WAMR, https://github.com/bytecodealliance/wasm-micro-runtime) which implements a GDB debug stub and supports the qWasmCallStack packet. ``` (lldb) process connect --plugin wasm connect://localhost:4567 Process 1 stopped * thread #1, name = 'nobody', stop reason = trace frame #0: 0x40000000000001ad wasm32_args.wasm`main: -> 0x40000000000001ad <+3>: global.get 0 0x40000000000001b3 <+9>: i32.const 16 0x40000000000001b5 <+11>: i32.sub 0x40000000000001b6 <+12>: local.set 0 (lldb) b add Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c (lldb) c Process 1 resuming Process 1 stopped * thread #1, name = 'nobody', stop reason = breakpoint 1.1 frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 1 int 2 add(int a, int b) 3 { -> 4 return a + b; 5 } 6 7 int (lldb) bt * thread #1, name = 'nobody', stop reason = breakpoint 1.1 * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12 frame #2: 0x40000000000001fe wasm32_args.wasm ``` This PR is based on an unmerged patch from Paolo Severini: https://reviews.llvm.org/D78801. I intentionally stuck to the foundations to keep this PR small. I have more PRs in the pipeline to support the other features/packets. My motivation for supporting Wasm is to support debugging Swift compiled to WebAssembly: https://www.swift.org/documentation/articles/wasm-getting-started.html
Pointers and GEP are untyped. SPIR-V required structured OpAccessChain. This means the backend will have to determine a good way to retrieve the structured access from an untyped GEP. This is not a trivial problem, and needs to be addressed to have a robust compiler. The issue is other workstreams relies on the access chain deduction to work. So we have 2 options: - pause all dependent work until we have a good chain deduction. - submit this limited fix to we can work on both this and other features in parallel. Choice we want to make is #2: submitting this **knowing this is not a good** fix. It only increase the number of patterns we can work with, thus allowing others to continue working on other parts of the backend. This patch as-is has many limitations: - If cannot robustly determine the depth of the structured access from a GEP. Fixing this would require looking ahead at the full GEP chain. - It cannot always figure out the correct access indices, especially with dynamic indices. This will require frontend collaboration. Because we know this is a temporary hack, this patch only impacts the logical SPIR-V target. Physical SPIR-V, which can rely on pointer cast remains on the old method. Related to llvm#145002
…lvm#152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, llvm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, llvm#32 add x12, x12, llvm#32 stp q0, q1, [x10, #-16] add x10, x10, llvm#32 ... ```
Need this as `mlir/dialects/transform/smt.py` imports it: ```py from .._transform_smt_extension_ops_gen import * from .._transform_smt_extension_ops_gen import _Dialect ```
This is part of the SYCL support upstreaming effort. The relevant RFCs can
be found here:
https://discourse.llvm.org/t/rfc-add-full-support-for-the-sycl-programming-model/74080
https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479
The SYCL runtime is device-agnostic and uses Unified Runtime (GitHub -
oneapi-src/unified-runtime) as an external dependency. This Unified Runtime
serves as an interface layer between the SYCL runtime and device-specific
backends. Unified Runtime has several adapters that bind to various backends.
NOTE: UR is considered as temporal solution until llvm-project/offload is
fully functional and is able to replace UR.
This commit adds:
fetching UR, UR build as dependency, document with a short overview of UR
with links to repos and documentation.